🚀 Pipeline Results Report

Generated: /root/verbalize-sampling

Experiments: 4

Metrics: diversity, ngram, joke_quality

📋 Experiment Configurations

NameTaskMethodModelResponsesTemperature
direct (samples=1)jokedirectanthropic/claude-3.7-sonnet300.7
sequence [strict] (samples=5)jokesequenceanthropic/claude-3.7-sonnet300.7
multi_turn [strict] (samples=5)jokemulti_turnanthropic/claude-3.7-sonnet300.7
structure_with_prob [strict] (samples=5)jokestructure_with_probanthropic/claude-3.7-sonnet300.7

📝 Generation Examples

direct (samples=1)

Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!
Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!
Method: directTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!

sequence [strict] (samples=5)

Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: sequenceTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!

multi_turn [strict] (samples=5)

Method: multi_turnTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms?

Because they make up everything!
Method: multi_turnTask: joke
Prompt:
Tell me a joke.
Response:
Did you hear about the mathematician who's afraid of negative numbers?

He'll stop at nothing to avoid them!
Method: multi_turnTask: joke
Prompt:
Tell me a joke.
Response:
What do you call a fake noodle? An impasta!

structure_with_prob [strict] (samples=5)

Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!
Method: structure_with_probTask: joke
Prompt:
Tell me a joke.
Response:
Why don't scientists trust atoms? Because they make up everything!

📊 Results Summary

Diversity Results

Note: Detailed arrays excluded from table: pairwise_diversities
ExperimentAverage DiversityMin DiversityMax DiversityStd DiversityAverage Response LengthAverage Unique WordsAverage Vocabulary RichnessTotal Cost
direct (samples=1)0.10770.00000.48500.151912.453312.22670.99070.0000
sequence [strict] (samples=5)0.31700.00000.48490.134013.980013.45330.97230.0001
multi_turn [strict] (samples=5)0.28530.00000.49220.122613.900013.39330.97260.0001
structure_with_prob [strict] (samples=5)0.32310.00000.49550.133414.086713.60000.97470.0001

📈 Diversity Visualizations

Aggregate Metrics
Aggregate Metrics
Pairwise Diversities Distribution
Pairwise Diversities Distribution
Response Length Distribution
Response Length Distribution
Unique Words Distribution
Unique Words Distribution
Vocabulary Richness Distribution
Vocabulary Richness Distribution

Ngram Results

Note: Detailed arrays excluded from table: pairwise_rouge_l_scores
ExperimentAverage Rouge LMin Rouge LMax Rouge LStd Rouge LAverage Response Length
direct (samples=1)0.70100.00001.00000.385512.4533
sequence [strict] (samples=5)0.17200.00001.00000.298213.9800
multi_turn [strict] (samples=5)0.21780.00001.00000.277113.9000
structure_with_prob [strict] (samples=5)0.16950.00001.00000.290914.0867

📈 Ngram Visualizations

Aggregate Metrics
Aggregate Metrics
Pairwise Rouge L Scores Distribution
Pairwise Rouge L Scores Distribution
Response Length Distribution
Response Length Distribution

Joke_Quality Results

ExperimentFunninessClevernessOriginalityStructureOverallNormalized Overall
direct (samples=1)3.57333.72002.22674.27333.44570.6891
sequence [strict] (samples=5)3.41333.74672.56004.22673.44800.6896
multi_turn [strict] (samples=5)3.36003.74002.40674.14673.38230.6765
structure_with_prob [strict] (samples=5)3.40673.76002.42674.16003.41200.6824

📈 Joke Quality Visualizations

Aggregate Metrics
Aggregate Metrics
Cleverness.Score Distribution
Cleverness.Score Distribution
Funniness.Score Distribution
Funniness.Score Distribution
Originality.Score Distribution
Originality.Score Distribution
Structure.Score Distribution
Structure.Score Distribution

📈 Overall Comparison

📈 Comparison Chart Visualizations

Comparison Chart
Comparison Chart